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ASSESSING THE ASSOCIATION BETWEEN TRENDS IN 
A BIOMARKER AND RISK OF EVENT WITH 
AN APPLICATION IN PEDIATRIC HIV/ AIDS 

By Elizabeth R. Brown 1 

University of Washington 

We present a new joint longitudinal and survival model aimed 
at estimating the association between the risk of an event and the 
change in and history of a biomarker that is repeatedly measured over 
time. We use cubic B-splines models for the longitudinal component 
that lend themselves to straight-forward formulations of the slope 
and integral of the trajectory of the biomarker. The model is applied 
to data collected in a long term follow-up study of HIV infected in- 
fants in Uganda. Estimation is carried out using MCMC methods. 
We also explore using the deviance information criteria, the condi- 
tional predictive ordinate and ROC curves for model selection and 
evaluation. 

1. Introduction. In longitudinal studies it is common to monitor one or 
more biomarkers repeatedly over time while following participants until the 
occurrence of an event. Researchers are often interested in examining both 
the repeated measures and the time to event to gain an understanding of 
the underlying disease process. Additionally, the risk of an event may not 
depend solely on the level of the biomarker but also on the rate at which that 
biomarker is changing or its past average level. For example, two patients 
may present with the same biomarker value, but one patient's biomarker 
trajectory may be increasing while the other's is remaining constant. Their 
prognosis may appear to be be the same if only the current value is accounted 
for, but in fact the patient with the increasing biomarker may be at higher 
risk. In this paper we present a model to estimate the association between 
the risk of an event and the current value, as well as the rate of change or 
history of a longitudinally sampled biomarker. We illustrate the approach 
from a sub-study of HIVNET 012 [Guay et al. (1999); Jackson et al. (2003)] 
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of HIV disease progression in Ugandan children who acquired HIV vertically, 
either in utero, during delivery or via breastmilk. In this sub-study children 
who tested positive before 18 months of age were followed until five years 
of age with blood samples drawn every 6 months. Prom these samples, the 
lab determined viral load, total lymphocyte count (TLC), CD4 percent and 
other related biomarkers. One aim was to determine the predictive value 
of these measures for time until death. In this manuscript we explore the 
association between change in and history of these markers and the risk of 
death. Overall, 128 children were followed after their first positive HIV test 
with lab measurements taken at regular intervals. Of these 128 infants, 70 
died during follow-up. 

In estimating the association between trends in a biomarker or set of 
biomarkers and the risk of an event, we face two important and distinct 
challenges. The first is selecting the correct model when the biomarker is 
collected in discrete time with error. The second is to determine how to 
obtain different summaries of the biomarker(s) and include them in the 
time to event model. 

The first issue and its resolution through joint modeling of the time to 
event and longitudinal marker has been studied extensively as summarized 
by Tsiatis and Davidian (2004) when dealing with the current level of a 
biomarker and the risk of an event. In summary, survival analyses with 
time-dependent covariates can be biased if we simply include the raw mea- 
surements in the survival analysis [Prentice (1982)]. Joint longitudinal and 
survival models resolve this issue by modeling the biomarker process over 
time and including subject-specific parameters from the longitudinal model 
as covariates in the survival model. These same issues exist when trying to 
include other summaries of the biomarker process in the survival model, and 
may, in fact, be exacerbated. 

Wulfsohn and Tsiatis (1997) and Faucett and Thomas (1996) introduced 
likelihood-based methods for analyzing a longitudinal marker and its asso- 
ciation with the time to event simultaneously. This class of models is not 
restricted to linking the time-varying value of the longitudinal marker to 
the time- varying hazard through a regression on the current value of longi- 
tudinal model. They may instead include other information from the longi- 
tudinal trajectories summarized by the longitudinal model. Several authors 
have proposed models that group the trajectories into latent classes, then 
link the hazard and the longitudinal model by the latent class [Lin et al. 
(2002); Proust-Lima, Letenneur and Jacqmin-Gadda (2007); Han, Slate and 
Pena (2007); Proust-Lima et al. (2009)]. These models can help characterize 
longitudinal trajectories that may indicate a higher risk for event. In these 
models the latent class for the trajectory of the biomarker is defined based 
on the entire follow-up for the biomarker, which may have the drawback of 



TRENDS IN A BIOMARKER AND RISK OF EVENT 



3 



using information about the biomarker from the future to estimate risk in 
the present. 

Other descriptors from the longitudinal model have also been used to 
link the trajectory of the biomarker to risk of event. Yu, Taylor and San- 
dler (2008) developed individual risk prediction models for prostate cancer 
recurrence based on PSA trajectories. PSA was modeled using a nonlin- 
ear exponential decay and exponential growth model. The current value as 
well as slope (first derivative) of the PSA trajectory were included as time- 
varying covariates in the hazard model. Ye, Lin and Taylor (2008) presented 
a two-stage regression calibration approach for modeling longitudinal and 
time-to-event data. Their longitudinal model included smoothing splines at 
the population level with individual deviations from the smoothing spline 
allowed through a mean integrated Wiener process and subject-specific 
slopes and intercepts. Both the current value and slope of the subject-specific 
trajectories were included as covariates in the hazard model. 

In this manuscript we extend the work of Brown, Ibrahim and DeGruttola 
(2005) who used cubic B-splines to model the impact of multiple biomarkers 
on time to event to include the slope and integral of the cubic B-spline mod- 
els as time-varying covariates in the hazard model. They showed that cubic 
B-splines provided flexibility in the biomarker model that simple parametric 
models could not. Because cubic B-spline trajectory models and their slopes 
and integrals are linear functions of the parameters that do not increase 
exponentially with time, they avoid computational instability. Additionally, 
because the basis functions of the cubic B-splines weight more heavily on 
local (in time) information for estimating the value of the trajectory, esti- 
mates of the slope early in follow-up do not rely heavily on values of the 
biomarker observed late in follow-up. 

The paper proceeds as follows. In the next section we review cubic B-spline 
models for longitudinal data and outline the model associating change and 
cumulative exposure with time to event. Next, we discuss estimation and 
model selection procedures. We then show an example from HIVNET 012. 
We conclude with a discussion. 

2. The joint longitudinal and survival model. In this section we describe 
a joint longitudinal and survival model to estimate the association between 
the rate of change in a biomarker or cumulative history of a biomarker and 
the risk of an event. We begin with a description of the notation and a 
review of the longitudinal cubic B-spline model, including expressions for 
the first derivatives and integrals. We then introduce the model linking the 
biomarker and its rate of change or cumulative history to the risk of event. 

2.1. The longitudinal model. Let yiji denote the ith, i = 1,...,N, sub- 
ject's jth, j = 1, . . . ,mj, observation of the Ith, I = 1, . . . , L, biomarker at 
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time tij < T, where T denotes the end of follow-up. We define an observa- 
tion at time ty to be a function of the true underlying trajectory i/j(Uj) plus 
error, 

Viji = ipi(tij) +eijh 

where the errors are independent and normally distributed such that (e«i, . . . , 
e ijL,)' ~ Ni(0, £), where Ni(a, b) is the ^-dimensional multivariate normal dis- 
tribution with mean vector a and covariance matrix b. Brown, Ibrahim and 
DeGruttola (2005) modeled the true, but unobserved, trajectory using cubic 
B-splines such that 

<? 

(2.1) ip l (t) = J2PilkB k (t), 

k=l 

where {-£?/%(•)} is a q-dimensional basis for spline functions on [0, T] with 
a fixed knot sequence, u = (ux, . . . , u q+ 4) and flu = (flm, ■ ■ ■ , fliiq)' is a vec- 
tor of subject-specific parameters of length q that determine the shape of 
the ith subject's trajectory. We assume a hierarchical model where flu ~ 
Ni(boi + X'iCthVoi). In this model the effect of the covariates is modeled at 
the population level where ai is a vector of parameters of length p linking 
the vector of baseline covariates X, to the longitudinal outcome and bo is the 
vector of length q of the mean of the coefficients for the kth basis function 
when Xn, . . . ,Xi p = 0. Brown, Ibrahim and DeGruttola (2005) showed the 
merits (better mixing in the Gibbs sampler and more flexible effect of the 
covariate on the longitudinal outcome) of modeling the effect of covariates 
on the longitudinal model in this level of the hierarchy. As a further exten- 
sion, we allow for a covariance structure for the spline coefficients where Voz 
is the q x q covariance matrix of the coefficients of the basis functions. 

An individual's contribution to the likelihood of the longitudinal marker 
can be expressed as 

1 f 1 mi 

(2.2) p(Yi\E,Pi) oc =exp --£(1^ - ^)) 'iT 1 ^- - ^)) 

I I I j=i 

where = (y^i, . . .,y ijL )' and ipfcj) = (Y>i(%), • • ■ ,Y>z (%))'■ 

2.2. The joint model. We next review the form of the joint model when 
we are only interested in modeling the relationship between the level of the 
marker at time t and the hazard at time t. We then show how the spline 
model of the trajectory can be used to describe the rate of change in and 
history of the biomarker. Next, we incorporate these functionals into the 
hazard, thus extending the joint model to estimate the impact of the rate 
of change and history on the risk of an event. 
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The usual joint longitudinal and survival model assumes proportional 
hazards, and the effect of the trajectories of the biomarkers on the hazard 
is modeled as 

(2.3) A(i) = Ao(i)exp( 7 V(i) + Z;C), 

where Xo(t) is the baseline hazard at time t, 7 is a parameter vector of length 
L linking the trajectory vector at time t to the hazard at time t, and £ is a 
vector of parameters of length p z linking the vector of baseline covariates Zi 
to the hazard. Taking a likelihood approach requires some specification of 
the baseline hazard. Here, we specify a piecewise constant hazard allowing 
for approximately 8-10 events in each interval, where 

X (t) =\j, Wj <t <w j+1 ,j = 1,..., J, 

where the Wj's are the jump points with w\ = and w j+i = 00. 
Then the cumulative hazard, 

f ' X{u)ei'^ u)+Z ^ du, 
Jo 

can be rewritten as 

J 
i=i 

where 

(2.4) H ij (P, 1 ,X)=I{s i > Uj-^Xj / e 7iK«)+«JC du 

JUj — l 

and I{si > Uj— 1} is an indicator function which equals 1 if the event time 
occurs in or later than the jth interval and otherwise. The integral in (2.4) 
does not have an analytical solution for the trajectory defined by cubic B- 
splines. Instead, for computational ease and speed, we use Gaussian quadra- 
ture to approximate it. 

Extending the model to include the relationship between another function 
or functions of the time- varying covariates and the hazard requires adding 
another term to the model. If we are interested in the relationship between 
rate of change of the biomarker and the hazard, we include the first deriva- 
tive of ip{t). If we are interested in the relationship between the cumulative 
history of the biomarker and the hazard, we include the integral of V'(^)- 

For clarity, we drop the subscripts for definition of the first derivative and 
integral. The first derivative of the trajectory function as shown by de Boor 
[(2001), page 116] can be expressed as 



(2.5) 



G 



E. R. BROWN 



where B^>*{t) is a vector of length q with the jth element equal to - 



b(2) ^ ° «j+s-«i 

M . 3 ±— u ■ x i an d B^ 2 \t) is the quadratic B-spline based on the same sequence 
of knots as the original B-spline in (2.1) with the first and last knot removed. 
Equation (2.5) is written as a linear function of the elements of however, 
it is a quadratic B-spline and still retains many of the desirable properties 
of B-splines mentioned earlier. 

The general idea for calculating the integral of the cubic B-spline was laid 
out by de Boor [(2001), page 128]. We derived the integral of the trajectory 
up to time t to be 

(2.6) /* i>{v) dv = (t) = £* (4) (t)'0, 



where B*^ (t) is a vector of length q with jth element equal to Y^k!=j+ 1 (t) 

Uj ), and {-Bi. (•)} is the vector of q + 1-dimensional basis of a quar- 
tic B-spline based on the knot vector u augmented by two arbitrary knots, 
uq < u\ and u q+ s > u q+ ^. 

To link the rate of change and history of the trajectories to the hazard, 
we use the following regression model: 

(2.7) X(t) = Ao(t) exp{ 7 V(t) + 7>'(*) + j'^Ht) + Z[Q, 

where 7 S is the L-length vector of parameters linking the L-length vector 
of the slopes of the trajectories at time t, if)'(t) to the hazard at time t and 
7/j is the L-length vector of parameters linking the L-length vector of the 
cumulative histories of the trajectories up to time t, ^ _1 (^) to the hazard 
at time t. 

As in the case where only the trajectory value at time t is linked to the 
hazard at time t, the cumulative hazard can be written as 

J 

where 

H ij{P,l,ls,lh,>) 

(2.8) = I{s i >u j ^ 1 }X j 

min(u_j ,Si) 

exp{j'ip(u) +^' s tp'{u) + r y' h ip ( '~ 1 \u)} du 

■3-1 

and 7s = (7 s i, . . . , ~f aL )' and j h = (j hl , ^ hL )' . Because equation (2.8) does 
not have a closed form solution, we evaluate it numerically using Gaussian 
quadrature. 



TRENDS IN A BIOMARKER AND RISK OF EVENT 



7 



The distribution for an individual's time to event, Sj, given the trajectory 
function and choice of hazard regression, is given by 



f( Si , ISM) = A( Si r exp{^( 7 V(si) + 7>'(*i) + 7^ (_1) (^) + 

(2.9) 



where is the censoring indicator for subject i. 

3. Estimation. We can now express the ith subject's contribution to the 
joint likelihood function as 



f(Yi,Si,Vi) = f{si,Vi\Yi) x f(Yi), 

f(Y h Si , Vi) cx X( Si r exp{^( 7 V(^) + 7>' (*0 + 7W> (_1) (*0 + 40} 



We then specify the prior distributions for the parameters in the likelihood 
as follows. We assume (7,7s, 7/0' ~ ^3l{Gq, G\), X ~ Wishartj^S^ 1 ) and 
Xj ~ gamma(doi) d±j). We may also specify priors on the hyperparameters of 



P, ai~N p (C ol ,C u ), b i~N p (A ol ,Au) and V ol 1 ~ Wishart^ V JS~^). Here, 



Wishart Jy (S'~ 1 ) denotes the Wishart distribution with v degrees of freedom 
and scale matrix S -1 and gamma(a, b) denotes the gamma distribution with 
shape parameter a and scale parameter b. The prior distributions were cho- 
sen to be as general as possible while still being proper and conjugate to the 
likelihood when possible. 

We use the Gibbs sampler [Gelfand and Smith (1990)] to sample from 
the posterior distribution of the parameters. Because the full condition- 
als of 7, 7 S , 7^ and C are log-concave, we use adaptive rejection sampling 
(ARS) [Gilks and Wild (1992)] to sample from them. We use the slice sam- 
pler [Neal (2003)] to sample the random effects, f3ij,j = l,...,q,i = l,...,N. 
The full conditionals of A, S, V and b are sampled from directly. The esti- 
mation procedure is implemented in R [R Development Core Team (2006)] 
and C with code available from the author. 

4. Model comparison. We examine two statistics for model comparison, 
the deviance information criterion (DIC) [Spiegelhalter et al. (2002)] and the 
conditional predictive ordinate (CPO) [Gelfand, Dey and Chang (1992)]. 
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The DIC is a measure of the deviance penalized for the number of pa- 
rameters in the model which may be difficult to ascertain in hierarchical 
models and is therefore estimated. The DIC is the sum of the deviance es- 
timated using the posterior estimates of the parameters, D(Q), and twice 
the effective number of parameters, po- The effective number of parameters 
is estimated by po = D(Q) — D(Q), where D(Q) is the posterior mean of 
the deviance (the average of the deviances calculated using the estimated 
parameters at each step of the MCMC sampler). For the model presented 
in this paper, the DIC can be expressed as 

-. G N N 

^ = 2gEEiog{/(s^^ ! |e( 5 ')}-^io g {/( Sil , 1 ,y 1 |e)}, 

g=l i=l i=l 

where 0^ denotes the parameter samples at the 5th, g = 1, . . . ,G, iteration 
of the Gibbs sampler and O represents the means of the posterior samples. 
A smaller DIC indicates a better fit when comparing models. 
For the ith observation, the CPO statistic is defined as 

(4.1) cpo l = f( Sl ,v i ,Y l \D^) = |/( Si ,i/ i ,y l |e,A)vr(e| J D(- i ))de, 

where denotes the model parameters, Di denotes the ith patient's covari- 
ate data and D^~^ denotes the covariate data for all the patients except 
the ith patient. We cannot obtain a closed form solution for (4.1); however, 
Chen, Shao and Ibrahim [(2000), Chapter 10] show that a Monte Carlo 
approximation of (4.1) is 

CPOi= {g g/to.^ieo,))) ' 

where 0^ denotes the parameter samples at the 5th, g = 1,...,G, itera- 
tion of the Gibbs sampler. A large CPO value indicates a better fit. We 
can compare different models using the sums of the logs of the CPOs of the 
individual observations, also known as the log pseudo-marginal likelihood 
(LPML). Models with greater LPML = J2^g(CPOi) values will indicate a 
better fit. Both the DIC and LPML are designed to measure a model's pre- 
dictive ability, although the DIC is based on a penalized deviance approach 
and the LPML is based on a cross-validated approach. 

5. Application. HIVNET 012, conducted in Uganda, was a double-blinded 
controlled randomized trial of single dose nevirapine for the mother and new- 
born infant versus AZT administered only to the mother to prevent mother 
to child transmission (MTCT) of HIV. In spite of the success of nevaripine 
in reducing the risk of transmission, many infants still experienced MTCT 
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of HIV. To better understand HIV infection in young children, 128 of those 
infants were enrolled in a long term follow-up study and followed until age 5 
or death. CD4 cell percent and HIV viral load are known indicators of dis- 
ease progression. TLC is also being studied for its potential use in resource- 
poor settings where routine CD4 and viral load monitoring may be cost 
prohibitive. In this section we examine the association between longitudinal 
measures of CD4 cell percent, viral load and TLC and time until death using 
separate models (one biomarker per model) in the HIVNET 012 long term 
follow-up study using the methods proposed in the previous sections. 

Seventy infants died during follow-up. Jump points for the baseline hazard 
function were selected based on quantiles of the event times so that approxi- 
mately 8 events occurred between the jump points. Infants had between one 
and thirteen longitudinal measures with a median of four. Overall, there 
were a total of 594 measurements of CD4 percent, 603 measurements of vi- 
ral load and 763 measurements of TLC. 13% of the CD4 percent measures, 
7% of the viral load measures and 16% of the TLC measures are taken at 
time 0. In this analysis we placed the knots for the splines based on the 
quantiles of the data; therefore, this clumping of measurement times limits 
the number of knots we could select before we start placing multiple knots 
at 0, making the slope undefined. Therefore, for CD4 percent, we fit models 
with q = 5, ... ,10. For TLC, we fit models with q = 5, . . . , 9. For compari- 
son to potential models with more basis functions, we also fit models with 
equally spaced knots with q = 11 for CD4 percent and q = 10 for TLC. For 
viral load, we can fit models with q = 5, ... ,19. Additionally, we included 
age at first positive HIV test as a covariate in the hazard model; therefore, 
the interpretation of £ is the change in the log hazard associated with a 1 
month increase in the age of the infant at the time of the first positive HIV 
test. We implemented the Gaussian quadrature procedure using 10 points. 
We also ran models with higher q using 50 points and found no difference 
in the estimates. 

Table 1 shows the estimated values of the parameters of interest along with 
their 95% credible intervals obtained from fitting the data to three versions 
of the hazard model shown in equation (2.7), one with 7 S = and 7^ = 0, 
one with 7^ = and one with 7 S = 0, for a range of q. The Gibbs sampler 
was run twice with different starting values and seeds for the random num- 
ber generator for 100,000 iterations for each model with a burn-in of 10,000. 
Samples from every 10th iteration were saved to reduce possible autocorre- 
lation. The resulting sample was used to compute parameter estimates and 
credible intervals as well as DIC and LPML. Based on DIC, the minimum 
number of basis functions for a cubic B-spline, q = 5, is not adequate for 
any of the three longitudinal outcomes. Although not shown, models with 
equally spaced knots never outperformed the models with knots based on 
quantiles according to DIC or LPML. There were no meaningful differences 
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Results from the models for the three markers of HIV disease progression 
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-0.08 


;-o.i8, -o.oi) 


1649 


-735 




+slope 2.03 


(1.49,2.70) 


1.49 (0.91,2.25) 


-0.07 


(-0.19,0.02) 


1647 


-709 




+history 1.20 


(0.47, 1.99) 




3.28 (-0.97,8.38) -0.09 


^-0.19,-0.01) 


1653 


-734 




o = 8 


current 1.64 


(1.11,2.28) 




-0.09 


[—0.18, -0.01) 


1646 


-731 


K 




+slope 1.93 
+history 1.22 


(1.34,2.62) 
(0.47,2.04) 


0.88 (0.29,1.6) 


-0.09 

3.84 (-0.63,9.59) -0.09 


(-0.19,0) 
[-0.21,-0.01) 


1616 
1655 


-715 
-728 


M 
ES 
O 


g = 9 


current 1.62 


(1.09,2.27) 




-0.08 


[-0.18,-0.01) 


1662 


-730 






+slope 1.92 


(1.31,2.66) 


0.98 (-0.32,2.22) 


-0.08 


(-0.19,0.00) 


1633 


-714 


'A 




+history 1.26 


(0.54,2.07) 




3.27 (-1.07,8.78) -0.10 


[-0.20,-0.03) 


1668 


-725 


CD 4 
















= 5 


current —0.83 


[-1.17,-0.54) 




-0.12 


[-0.22,-0.05) 


1767 


-743 




+slopc -0.87 


[-1.22,-0.56) 


-1.81 (-4.24,0.37) 


-0.13 


[-0.23,-0.05) 


1764 


-732 




+history —0.6 ( 


-1.03,-0.19) 




-2.05 (-4.75,0.36) -0.13 


[-0.23,-0.05) 


1747 


-731 


g = 6 


current —0.81 


;-1.15,-0.51) 




-0.12 


[-0.22,-0.04) 


1738 


-741 




+slopc -0.91 


[-1.29,-0.58) 


-1.47 (-3.32,0.80) 


-0.12 


[-0.22,-0.05) 


1728 


-730 




+history —0.57 


-1.02.-0.1.-)) 




-1.76 (-4.14,0.38) -0.13 


[-0.23,-0.05) 


1722 


-726 




current —0.96 


[-1.34,-0.62) 




-0.12 


[-0.22,-0.05) 


1674 


-731 






+slope -0.98 


;-1.39,-0.63) 


-0.38 (-1.36,0.82) 


-0.12 


[-0.22,-0.05) 


1672 


-719 
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in the estimates between the results from the two samplers. Convergence was 
assessed using diagnostic tools provided in the CODA package [Plummer et 
al. (2006)]. 

For viral load, DIC selected q = 6 for all three models. The LPML in- 
creased as q increased except for the model with 7 and 7 S where it selected 
q = 7; however, the value was not very different than that for q = 6. Here, we 
focus on the results for q = 6. The point estimate of 7 indicates an increased 
risk of death with increasing viral load such that a 10-fold increase in viral 
load is associated with a 16.3-fold increase in the hazard [95% credible inter- 
val (CI): (4.4, 74.5)]. The rate of change in viral load is also associated with 
risk of death. At a known level of viral load, a unit increase per month in the 
rate of change of log viral load is associated with a 2. 8- fold increase in the 
hazard [95% CI: (1.15, 7.4)]. The estimate of the strength of this association 
decreases as q increases. However, increasing q also introduces more fluc- 
tuations in the estimate of the slope over time that may not be supported 
by the data. Additionally, the history of viral load is also associated with 
risk of death, with a one unit increase in the mean value of the trajectory 
of log viral load times a unit change in the length (in months) of follow-up 
being associated with 1.04-fold increase in the hazard. In plainer terms, if 
two infants have been followed for 7 months, with a difference in mean val- 
ues equal to 1, the hazard ratio would be exp(7 * 4.3/100) = 1.35. If after 12 
months the difference in mean levels was still equal to 1, the hazard ratio 
would be 1.68. Figure 1 shows the trajectory fits for viral load with q = 6. 
The model fits a variety of shapes suggested by the data. For example, the 
initial steep rise in infants 2, 10 and 12 is fit well, as is the initial decrease 
in infants 6 and 9. 

For CD4 percent, the DIC selected q = 7 for all three models. The LPML 
selected q = 7 for the current value and current value plus slope models and 
q = 8 for the current value plus history model (although the value was close 
to that for q = 7). Here, we focus on the results for q = 7. A 10 unit decrease 
in CD4 percent is associated with a 2.6-fold increase in the hazard [95% 
CI: (1.86, 3.82)]. There is no suggestion of a further association between 
risk of death and the change in CD4 percent or its history. Figure 2 shows 
the fit of the spline model to the observed CD4 percent data for 12 infants. 
The majority of the infants in Figure 2 have an initial sharp decline in CD4 
percent which is fit well by the model without forcing decreases where the 
data do not suggest it (infant 6). 

For TLC, the DIC and LPML selected q = 9. The density of measurements 
near time zero forced quantile-based knots very close together if not equal 
shortly after infection leading to undefined slopes for models with q > 9. 
Using equally spaced knots did not result in improved DIC or LPML. Here 
we focus on the results for q = 9. A possible association between TLC and 
risk of death was suggested in this model with a 1000 unit increase in TLC 
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Fig. 1. Longitudinal profiles of viral load for 12 infants. Circles mark the observed values. 
The solid black line indicates the fit from the model. The red line represents the fitted 
cumulative value. The green line represents the fitted slope. A vertical grey line indicates 
time of death if it was observed. 
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Fig. 2. Longitudinal profiles of CD4 percent for 12 infants. Circles mark the observed 
values. The solid black line represents the fit from the model. The red line represents 
the fitted cumulative value. The green line represents the fitted slope. A vertical grey line 
indicates time of death if it was observed. 
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being associated with a 34% [95% CI: (20, 49)] decrease in the risk of death. 
Figure 3 shows the fit from the longitudinal models for TLC. 

The models all estimate a similar association between age at first positive 
test and risk of death, with a one month increase in age at first positive test 
associated with an approximate 10% reduction in risk given the trajectory 
of the longitudinal marker. 

Figure 4 shows the posterior estimate of the cumulative hazards for the 
three markers when only the current value of the marker is included in the 
model. TLC provide better fits to the Kaplan-Meier estimate in the first 
year, while viral load provides a better fit between 2 and 3 years. 

We propose using ROC curves for censored data [Heagerty, Lumley and 
Pepe (2000)] as an additional comparison of joint longitudinal and survival 
models. We plotted ROC curves for predicting death within 1 year based 
on linear predictor, ~fip(t) + "y s ip'(t) + (i)) + Z-(, at t = 6, 12 and 

18 months after infection (Figure 5). Taking t = 6 months as an example, 
we treat the baseline time for survival as 6 months, and calculate the ROC 
curve for death within 12 months (18 months after infection) with the linear 
predictor calculated at 6 months as the biomarker. Infants who died or were 
censored before 6 months are not included. The same procedure was used 
for 12 and 18 months. These results do not suggest any large improvement 
in prediction when either the rate of change in or history of the biomarker 
are included in the models, except possibly for including slope in the viral 
load model. This agrees with the results from the model and the DIC and 
LPML. Overall, TLC does not compare favorably to either viral load or CD4 
percent in predicting death within one year, and viral load appears to be 
the best predictor. 

6. Discussion. Understanding the relationship between trends in a 
biomarker and risk of an event can yield important insight into the mecha- 
nisms of disease progression. Clearly, this is recognized scientifically, as the 
Journal of the American Medical Association recently made an exception to 
its own policies on publishing data over five years old to report an analysis 
of the association between CD4 slopes estimated from linear mixed effects 
models and time to development of AIDS or death in antiretroviral naive 
HIV-infected adults [Mellors et al. (2007)]. Here, we present a model that 
goes several steps further to addressing that question by jointly modeling 
the time to event and the slope and looking at local changes as opposed 
to long term trends. Additionally, we do not restrict ourselves to a linear 
model that assumes constant rate of change. This model is motivated and 
illustrated by a long-term follow-up study of HIV-infected children in Africa. 
However, it can be used in many settings where understanding the relation- 
ship between trends in a biomarker and time to an event is of interest. Other 
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FlG. 3. Longitudinal profiles of TLC for 16 infants with q — 6. Circles mark the observed 
values. The solid black line indicates the fitted trajectory from the model. The red line 
represents the fitted cumulative value. The green line represents the fitted slope. A vertical 
grey line indicates time of death if it was observed. 
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Months since first positive HIV test 

Fig. 4. Fitted cumulative hazard curves for the selected models for TLC, viral load and 
CD4 percent. The solid and dashed stepped lines represent the Kaplan-Meier fit with 95% 
confidence intervals, with the small vertical lines representing censored observation times. 

examples may include cognitive function and death or subclinical coronary 
disease and clinical coronary events. 

This model gives interpretable parameters, although these values are dif- 
ficult to translate into practice. For example, although we estimate that a 
10-fold (1 log unit) increase per month in viral load is associated with a 
2.8-fold increase in the hazard, we cannot easily measure the instantaneous 
change in viral load in practice. To do so, we would have to measure it fre- 
quently, which is not feasible in practice, especially in resource-poor settings. 
However, this approach may indicate if there is more information available in 
collected information than just the current value and what that information 
might be (change versus average history, for example). 

We have proposed a model that provides additional insight into the bi- 
ological processes of disease progression by linking the hazard of the event 
to the rate of change or cumulative history of a biomarker. This model ex- 
pands the possibilities of examining disease progression within the class of 
joint longitudinal and survival models. 

APPENDIX: SAMPLING FROM THE POSTERIOR 

We use Gibbs sampling to sample from the joint posterior distribution 
of the parameters: (3, a, 7, 7 S , 7/1, A, b Q and Vo- The joint posterior does 
not have a closed form; however, given that the conditional posteriors either 
have a closed form or are log-concave, implementation of the Gibbs sampler 
is straight-forward. Let D denote the data and rest denote the remaining 
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Fig. 5. ROC curves for predicting death within one year based on the hazard function 
calculated at 6, 12 and 18 months after initial positive HIV test for current value (solid 
line), current value and slope (dashed line) and current value and history (dotted line) 
models. The areas under the ROC curves are shown with each plot. 

parameters. Then at each iteration of the Gibbs sampler, we proceed as 
follows: 

1. Let flu = (Pin, . . . , Paq)', i = 1, • • • ,N, I = 1, . . . , L. Then use ARS to sam- 
ple [Pu\rest,D] from 



p(Pi | rest , D) oc exp < 



ij=i 

L 

J2(Pil - boi ~ X' i a l )'V ol 1 (p a - b oi - X[ ai ) 
i=i 



TRENDS IN A BIOMARKER AND RISK OF EVENT 

x exp|i/i(7V(«i) + 7>'(*i) + 7/> (_1) (*i) + ^C) 

-e^^^(/3,7, 7s ,7 fc ,A)}. 
i=i J 

2. Sample 

~ Wishart ^ hS"* + ^(A* - b i - x-a)(/3 iZ - b m - x' i a)'\ , 



iV + M 



f'o/ 



3. Sample 

[S -1 |re«f,i5] 

/ / N mi \ -1 

~ Wishart ^ J^ 1 + - ^(ty))(ly - ^(tii))'J , 



TV \ 

^2 rrn + z/s J • 

i=l / 



4. Let 60/ = (&ozii • ■ • j &0Zg)'> i = lj • • • j N, I = 1, . . . , L. Then sample 

[60/ 1 rest , D] ~ iV 2 (/x bo! , E& M ) , 

where 



and 

-1 , /i— 1\— l 



5. Use ARS to sample [7, 7^, 7/ l |rest, D] from 

p(7|resi,D) ocexp{x(^( 7 V(si) +7>'(si) + 7/> (_1) (si) + *iC) 



e***^ 7, 7,, 7/,, A) 



x exp|-i((7,7 s ,7 /l ) / -g )'g 1 1 ((7,7«,7h)' - 3o)|- 
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6. Sample [Xj\rest,D] = gamma (d Qj + nj, Y^Li e z '^Hij(P, 7, Is, 7h, 1) + dij), 
where nj is the number of events in the jth interval and 7, 1) is 
Hijiflililsilhi ^) evaluated with Xj = 1. 

7. Sample 

[a|n&si,D] ~ iV p (/i a , S a ), 

where 
and 

S Q = ( ^ Xil q V Ql lqXl + c 1 i j , 
where l q is a vector of ones of length q. 
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